35 research outputs found

    Linguistic and orthographical classic Portuguese variants. Challenges for NLP

    Get PDF
    In recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work

    A Prosodia de Bento Pereira: contributos para o estudo lexicográfico e filológico

    Get PDF
    Doutoramento em Linguística PortuguesaO presente trabalho tem por objecto de estudo o conjunto lexicográfico da Prosodia de Bento Pereira e a recuperação do seu texto. Foi feito o registo integral do texto dicionarístico em suporte digital, totalmente editável. Neste trabalho apresenta-se o estudo da obra no que respeita às suas características lexicográficas e percurso bibliográfico. Dilucida-se ainda o percurso editorial deste conjunto dicionarístico, sobre o qual têm subsistido algumas discrepâncias. A observação do léxico português deste conjunto editorial ocupa uma boa parte deste trabalho. Trata-se de um corpus de grandes dimensões que é constituído por 46 067 formas portuguesas não lematizadas e com numerosas variantes gráficas. Apresentam-se alguns dados estatísticos dos subcorpora português e latino. A confrontação destes corpora permite ainda a observação de testemunhos de relatinização do português e de transferência de sufixos latinos muito produtivos. Observam-se também alguns aspectos do léxico português que assinalam o seu percurso diacrónico. São feitas anotações relativas à formação de palavras através da crescente disponibilidade do sistema sufixal.This dissertations aims to present the study of the lexicographical volume of the Bento Pereira’s Prosodia and the recovering of the text.We have passed the volume in paper support to a digital support, totally editable. In this work, we present the study of the edition of Prosodia, in what concerns his lexicographical characteristics and bibliographical evolution. Also, we clarify some unconformity of the numeration of the editions. The observation of the portuguese lexicon of this linguistic corpus takes place. We are standing before a corpus of great dimension, with his 46 067 portuguese forms non lematized and with several graphic variations. Some statistical obervations are presented, in portuguese corpus, latin corpus, and the contrast of both. We can observe some testimonies of a relatinization of portuguese language and very productive latin sufixal transfer. Some aspects of the portuguese lexicon that mark his diachrony way are shown. Also, we present some observations of word formation throught the increasing disponibility of the sufixal system

    As Memórias Paroquiais: do manuscrito ao digital

    Get PDF
    This text aims to trace the history of the custody of the Parish Memories ("Memórias Paroquiais"), from the diffusion of the surveys in 1758, to the current projects which aim at their conversion into digital objects and data. Reflecting on this itineracio is also a way to evaluate and rethink working strategies on this collection. It should be noted that this is a relevant resource for the understanding of mid-eighteenth century Portugal and of interest not only to the historian, but also to many other scholars and actors in many fields.Trabalho desenvolvido no âmbito dos projetos UIDB/00057/2020 e PTDC/ART-HIS/32327/2017 - FCT – Portuga

    Planear a normalização automática: tipologia de variação gráfica do corpus das Memórias Paroquiais (1758)

    Get PDF
    Digital Humanities are now essential for studies on large-scale textual corpora, where the transformation of text into processable data regarding linguistic phenomena requires a multidisciplinary treatment. In this article we will present an approach in Digital Humanities, which was applied to a Portuguese textual corpus from the 18th-century, gathered from a set of documents known as Memórias Paroquiais [“The Parish Memoirs”], with high historical and heritage value. We will highlight some corpus constitution characteristics, questions concerning the expressive spelling variation perceived in the texts. We propose a typology towards a future automatic normalization of this textual corpus.FCT - Portugal - UIDB/00057/202

    3economy+ GLOSSARY. A corpus-based trilingual terminology dictionary on international economy, marketing and tourism

    Get PDF
    This work is the result of the work done in the European Project 3Economy+ (Project number: 2017-1-ES01-KA203-038141), funded by the European Commission (Action KA203 - Strategic Partnerships for Higher Education). The European Commission support for the production of this publication does not constitute endorsement of the contents which reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained thereinThe present volume includes 200 lexical units, mainly words, selected as a representative glossary of the 3Economy+ project, which could well be extrapolated to the wider field of expertise of the international economy. In the first section of this publication we have outlined the entire lexicographic process that led to the design of the specialised glossary. The second part contains the glossary with three alphabetical indexes -one for each language: English, Spanish and Portuguese–. Through it, we arrive at the descriptive chart of each one of the 200 voices that we have extracted from the corpus, with their equivalents in the three languages of the consortium, as well as complementary information. In the last section, we enclose a set of activities to practice the words in each of the languages, as well as to work on the equivalents in the three languages in a combined way.En el presente volumen presentamos los 200 términos que hemos seleccionado como glosario representativo del proyecto 3Economy+, que bien podría extrapolarse al ámbito de especialización más amplio de la economía internacional. En la primera sección de esta publicación hemos dejado constancia de todo el proceso lexicográfico que nos ha llevado al diseño del glosario especializado. En la segunda parte se recoge el glosario con tres índices alfabéticos –uno para cada lengua: inglés, español y portugués–. A través del cual se llega a la ficha descriptiva de cada una de las 200 voces que hemos extraído del corpus, con sus equivalentes en las tres lenguas del consorcio, así como información complementaria. En la última sección se incluye un conjunto de actividades para practicar las palabras en cada una de las lenguas, así como para trabajar los equivalentes en las tres lenguas de manera combinada.Neste presente volume apresentamos 200 vocábulos por nós selecionados como glossário representativo do projeto 3Economy+, que bem se conseguiria extrapolar ao âmbito de especialidade mais amplo da economia internacional. Na primeira parte desta publicação demos conta de todo o processo lexicográfico que nos levou ao desenho do glossário de especialidade. Na segunda parte exibe-se o glossário com os três índices alfabéticos –um para cada língua: inglês, espanhol e português–. Segue-se a ficha descritiva de cada um dos vocábulos, com os respetivos termos equivalentes nas três línguas do consórcio, bem como com informação adicional. A última parte desta publicação inclui um conjunto de atividades para praticar as palavras em cada uma das línguas, bem como trabalhar os termos equivalentes nas três línguas de maneira combinada.European Project 3Economy+ (Project number: 2017-1-ES01-KA203-038141), funded by the European Commission (Action KA203 - Strategic Partnerships for Higher Education

    Digital Humanities and Portuguese Processing: a research pathway

    Get PDF
    This paper reflects on the whole path of work in digital humanities, on the light of the projects related to text processing under development at CIDEHUS. These projects deal with a rich heritage related to the Portuguese culture, history and language. This paper reflects on the many challenges to be faced and how NLP techniques may broaden the capabilities of organising and sharing knowledge related to these resources

    Pervasive gaps in Amazonian ecological research

    Get PDF
    Biodiversity loss is one of the main challenges of our time,1,2 and attempts to address it require a clear un derstanding of how ecological communities respond to environmental change across time and space.3,4 While the increasing availability of global databases on ecological communities has advanced our knowledge of biodiversity sensitivity to environmental changes,5–7 vast areas of the tropics remain understudied.8–11 In the American tropics, Amazonia stands out as the world’s most diverse rainforest and the primary source of Neotropical biodiversity,12 but it remains among the least known forests in America and is often underrepre sented in biodiversity databases.13–15 To worsen this situation, human-induced modifications16,17 may elim inate pieces of the Amazon’s biodiversity puzzle before we can use them to understand how ecological com munities are responding. To increase generalization and applicability of biodiversity knowledge,18,19 it is thus crucial to reduce biases in ecological research, particularly in regions projected to face the most pronounced environmental changes. We integrate ecological community metadata of 7,694 sampling sites for multiple or ganism groups in a machine learning model framework to map the research probability across the Brazilian Amazonia, while identifying the region’s vulnerability to environmental change. 15%–18% of the most ne glected areas in ecological research are expected to experience severe climate or land use changes by 2050. This means that unless we take immediate action, we will not be able to establish their current status, much less monitor how it is changing and what is being lostinfo:eu-repo/semantics/publishedVersio

    Pervasive gaps in Amazonian ecological research

    Get PDF

    Pervasive gaps in Amazonian ecological research

    Get PDF
    Biodiversity loss is one of the main challenges of our time,1,2 and attempts to address it require a clear understanding of how ecological communities respond to environmental change across time and space.3,4 While the increasing availability of global databases on ecological communities has advanced our knowledge of biodiversity sensitivity to environmental changes,5,6,7 vast areas of the tropics remain understudied.8,9,10,11 In the American tropics, Amazonia stands out as the world's most diverse rainforest and the primary source of Neotropical biodiversity,12 but it remains among the least known forests in America and is often underrepresented in biodiversity databases.13,14,15 To worsen this situation, human-induced modifications16,17 may eliminate pieces of the Amazon's biodiversity puzzle before we can use them to understand how ecological communities are responding. To increase generalization and applicability of biodiversity knowledge,18,19 it is thus crucial to reduce biases in ecological research, particularly in regions projected to face the most pronounced environmental changes. We integrate ecological community metadata of 7,694 sampling sites for multiple organism groups in a machine learning model framework to map the research probability across the Brazilian Amazonia, while identifying the region's vulnerability to environmental change. 15%–18% of the most neglected areas in ecological research are expected to experience severe climate or land use changes by 2050. This means that unless we take immediate action, we will not be able to establish their current status, much less monitor how it is changing and what is being lost
    corecore